Skip to content

ENH: Add an errors flag to tz_localize #13058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

dancsi
Copy link
Contributor

@dancsi dancsi commented May 2, 2016

closes #13057

Author: dancsi [email protected]

@rockg
Copy link
Contributor

rockg commented May 2, 2016

You need to add some tests to test this code path.

@codecov-io
Copy link

codecov-io commented May 2, 2016

Current coverage is 84.14%

Merging #13058 into master will not change coverage

@@             master     #13058   diff @@
==========================================
  Files           137        137          
  Lines         50261      50261          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42288      42288          
  Misses         7973       7973          
  Partials          0          0          

Powered by Codecov. Last updated by 1296ab3...db9723b

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype labels May 3, 2016


- Add ``errors`` flag to tz_localize, so you can silently ignore nonexistent timestamps and replace them with ``NaT`` if ``errors`` is set to ``'coerce'``. The default behaviour is still raising a NonExistentTimeError (``errors='raise'``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on tz_localize and NonExistenTimeError. add the issue number as well.

@jreback
Copy link
Contributor

jreback commented May 3, 2016

pls add some tests.

@@ -3959,10 +3968,12 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None):
int64_t v, left, right
ndarray[int64_t] result, result_a, result_b, dst_hours
pandas_datetimestruct dts
bint infer_dst = False, is_dst = False, fill = False
bint infer_dst = False, is_dst = False, fill = False, is_coerce = errors=='coerce', is_raise = errors=='raise'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put these on another line, we don't PEP these yet

@dancsi
Copy link
Contributor Author

dancsi commented May 3, 2016

If needed, I will add some tests for DatetimeIndex tomorrow...

@dancsi
Copy link
Contributor Author

dancsi commented May 4, 2016

@jreback hey, I think I discovered a bug while writing more detailed test cases for Timestamp.tz_localize. Namely, for some US timezones, the up to 7am local time on the day DST has started, the time is shifted by 1 hour in advance

>>> pd.Timestamp('2014-03-09 06:00', tz='US/Eastern')
Timestamp('2014-03-09 07:00:00-0400', tz='US/Eastern')
>>> pd.Timestamp('2015-03-08 06:01', tz='US/Eastern')
Timestamp('2015-03-08 07:01:00-0400', tz='US/Eastern')
>>> pd.Timestamp('2014-03-09 06:00', tz='US/Pacific')
Timestamp('2014-03-09 07:00:00-0700', tz='US/Pacific')

Note that this doesn't happen after 7am that day, or in the following days

>>> pd.Timestamp('2014-04-09 06:00', tz='US/Eastern')
Timestamp('2014-04-09 06:00:00-0400', tz='US/Eastern')
>>> pd.Timestamp('2015-04-08 06:01', tz='US/Eastern')
Timestamp('2015-04-08 06:01:00-0400', tz='US/Eastern')
>>> pd.Timestamp('2014-04-09 06:00', tz='US/Pacific')
Timestamp('2014-04-09 06:00:00-0700', tz='US/Pacific')

Am I missing something, or is this really a bug? I have pushed my tests. Here is pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en

pandas: 0.18.1+10.g49f3618.dirty
nose: 1.3.7
pip: 8.1.1
setuptools: 20.9.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: 0.7.2
IPython: 4.2.0
sphinx: 1.4.1
patsy: None
dateutil: 2.5.3
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 4, 2016

@dancsi that last is this bug: #7825

when a tz is passed directly to the Timestamp constructor it is not handling DST correcty. All the other routines (meaning pd.to_datetime), and conversion of a naive timestamp which is then localized are fine. Its the case of a direct construction.

Well for a PR on that. (its independent of this issue).

@dancsi
Copy link
Contributor Author

dancsi commented May 4, 2016

@jreback so, I should make a PR with these tests for issue #7825, and remove them from here?

@jreback
Copy link
Contributor

jreback commented May 4, 2016

well for that issue you can add your examples, but also take the examples from that issue as well.

This PR should be independent of the other issue. IOW your tests should be testing this specific issue.

@dancsi
Copy link
Contributor Author

dancsi commented May 6, 2016

I don't understand why codecov is failing? After all, I added some tests, and they are passing...

@jreback
Copy link
Contributor

jreback commented May 6, 2016

you have to rebase against master, I fixed the .yml file.

@@ -533,6 +533,21 @@ def test_ambiguous_nat(self):
di_test = DatetimeIndex(times, tz='US/Eastern')
self.assert_numpy_array_equal(di_test, localized)

def test_nonexistent_raise_coerce(self):
from pytz.exceptions import NonExistentTimeError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

@jreback
Copy link
Contributor

jreback commented May 6, 2016

looks pretty good. couple of commets. pls rebase / squash. ping on green.

@jreback jreback added this to the 0.18.2 milestone May 6, 2016
@dancsi
Copy link
Contributor Author

dancsi commented May 6, 2016

@jreback all checks have passed. Thank you for guiding me through the process :)

@jreback jreback closed this in 5541fd7 May 7, 2016
@jreback
Copy link
Contributor

jreback commented May 7, 2016

very nice @dancsi thank you!

if you would like to work on other issues would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add option to tz_localize to return NaT instead of raising a NonExistentTimeError
4 participants